Parallel BLAST on split databases

نویسنده

  • David R. Mathog
چکیده

SUMMARY BLAST programs often run on large SMP machines where multiple threads can work simultaneously and there is enough memory to cache the databases between program runs. A group of programs is described which allows comparable performance to be achieved with a Beowulf configuration in which no node has enough memory to cache a database but the cluster as an aggregate does. To achieve this result, databases are split into equal sized pieces and stored locally on each node. Each query is run on all nodes in parallel and the resultant BLAST output files from all nodes merged to yield the final output. AVAILABILITY Source code is available from ftp://saf.bio.caltech.edu/

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

paraBLAST: A Highly Scalable Parallelized BLAST Solution

Programs of the NCBI BLAST family have been widely used for retrieving homologous sequences from existing databases. This article briefly introduces and evaluates a parallelized version of the BLAST algorithm, paraBLAST, using Message Passing Interface (MPI) on a multi-node compute cluster. A dynamical database fragmentation scheme based on the availability of a compute cluster is proposed. Its...

متن کامل

TurboBLAST(r): A Parallel Implementation of BLAST Built on the TurboHub

BLAST (Basic Local Alignment Search Tool) is by far the most widely used application for rapid screening of large sequence databases. This paper describes TurboBLAST, a parallel implementation of BLAST suitable for execution on networked clusters of heterogeneous PCs, workstations, or Macintosh computers.

متن کامل

BGBlast: A BLAST Grid Implementation with Database Self-Updating and Adaptive Replication

BLAST is probably the most used application in bioinformatics teams. BLAST complexity tends to be a concern when the query sequence sets and reference databases are large. Here we present BGBlast: an approach for handling the computational complexity of large BLAST executions by porting BLAST to the Grid platform, leveraging the power of the thousands of CPUs which compose the EGEE infrastructu...

متن کامل

A Local Sequence Alignment Algorithm Using an Associative Model of Parallel Computation

Local sequence alignment is widely used to discover structural and hence, functional similarities between biological sequences. While the faster heuristic methods like BLAST and FASTA are useful to compare a single sequence to hundreds or even thousands of sequences in genetic databases such as GenBank, EMBL, and DDBJ, this work yields pairwise alignments with a high sensitivity. The heuristic ...

متن کامل

PARALIGN: rapid and sensitive sequence similarity searches powered by parallel computing technology

PARALIGN is a rapid and sensitive similarity search tool for the identification of distantly related sequences in both nucleotide and amino acid sequence databases. Two algorithms are implemented, accelerated Smith-Waterman and ParAlign. The ParAlign algorithm is similar to Smith-Waterman in sensitivity, while as quick as BLAST for protein searches. A form of parallel computing technology known...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Bioinformatics

دوره 19 14  شماره 

صفحات  -

تاریخ انتشار 2003